-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coreos-boot-mount-generator: Always use mpath for /boot if for root #1022
coreos-boot-mount-generator: Always use mpath for /boot if for root #1022
Conversation
We're currently gating on `ENV{DM_SUSPENDED}=="Active"` but `10-dm.rules` does: ``` ENV{DM_SUSPENDED}=="Active", ENV{DM_SUSPENDED}="0" ENV{DM_SUSPENDED}=="Suspended", ENV{DM_SUSPENDED}="1" ``` So what I think is happening here is that our rule happens to run before that kicks in, so we make the links once, but not thereafter. Change the condition to match what `13-dm-disk.rules` from LVM is doing. Also slightly reorder the code and add some comments for extra clarity.
This doesn't textually conflict with #1011 Although the PR commit says:
Ooh. Right, so the code here then needs to be refined to check whether multipath is in the stack. But the rdcore code already knows that right? Feels like we should be able to communicate the expected path to |
ENV{DM_ACTIVATION}!="1", GOTO="dm_label_end" | ||
ENV{ID_FS_LABEL_ENC}!="?*", GOTO="dm_label_end" | ||
ENV{ID_FS_UUID_ENC}!="?*", GOTO="dm_label_end" | ||
ENV{DM_SUSPENDED}=="1", GOTO="dm_label_end" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think this will help with some of the udev issues I'm working around in #1011.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to this area of the code, I've also removed the DM_ACTIVATION
line in #1011, which makes the symlinks even more solid!
# If the root device is multipath, hook up /boot to use that too, | ||
# based on our custom udev rules in 90-coreos-device-mapper.rules | ||
# that creates "label found on mpath" links. | ||
# Otherwise, use the usual by-label symlink. | ||
rootdev="$(findmnt -nvr -o SOURCE /)" | ||
case "${rootdev}" in | ||
/dev/mapper/mpath*) bootdev=/dev/disk/by-label/dm-mpath-boot;; | ||
*) bootdev=/dev/disk/by-label/boot;; | ||
esac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think here we should key off of getargbool 0 rd.multipath
instead as we do in #1011 (note not rd.multipath=default
necessarily to be forward compatible with supporting non-default configs). That also avoids having to deal with the LUKS-on-multipath case entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually though there's a distinction between "this technology is in use" and "this technology is used for these devices".
It looks like dracut interprets rd.multipath
as "must start multipathd in the initramfs". OK, fine. But that doesn't (outside of what you're proposing) normally imply that multipath is necessarily used e.g. for the root filesystem.
We are using the root=/dev/disk/by-label/dm-mpath-root
as a way to express that.
This problem circles back a bit to whether or not we want to support having /boot
on a physically distinct storage from the root filesystem. I am skeptical of the value of that. But OTOH it will work just fine if someone does it manually today so we probably need to not surprise people doing it by having their systems fail to boot after an upgrade.
So...I think e.g. Anaconda ends up being completely explicit about what needs to be done, you have things like rd.lvm.lv
to tell dracut to wait for a specific LV in the initramfs, and root=/dev/mapper/fedora-root
to wait for that block device. The /boot
bit is in /etc/fstab
so not normally mounted in the initramfs, but on traditional systems w/client side initramfs generation you have the target /etc/fstab
in the initramfs, so the generator can pull that in too.
I think what we're iterating towards here is then we need to support being explicit about where to find /boot
. Which I think basically should mirror how we handle /
with a missing root=
karg effectively meaning root=/dev/disk/by-label/root
.
Now, we need to support the current case of rd.multipath=default
. I think if boot=
is not present, then we implicitly make it boot=/dev/disk/by-label/dm-mpath-boot
.
But if there is a boot=
karg, then we use it as the path to the device.
Now - we're being more explicit, but I think this may in turn require your proposed support for "day 1 multipath" to inject a boot=
karg or so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like dracut interprets rd.multipath as "must start multipathd in the initramfs". OK, fine. But that doesn't (outside of what you're proposing) normally imply that multipath is necessarily used e.g. for the root filesystem.
I'm 100% with you there (see last paragraph of #1011 (comment)). In an ideal scenario, we don't have any handling of rd.multipath
at all because all we really care about is that multipathd took ownership of the devices. So here, we'd have the boot mount have something like After=multipathd-settled.target
.
A major reason multipath is special compared to other storage configuration options is that the label we look for is already present even before multipath takes ownership. Whereas you don't have that with the rootfs on e.g. LUKS, where the symlink simply doesn't appear before it's actually ready to be used. Or e.g. in full-disk RAID1, the constituent boot partitions don't have a boot
label.
But zooming out here, it must be that the boot partition will always have label boot
because that's how GRUB itself finds it and boots the machine (apart from the special logic it has for RAID1). So a boot
karg would be redundant in that respect and introduce another source of truth. It'd only really be useful for the multipath scenario, where we need to deal with that funky situation of /dev/disk/by-label/boot
changing meaning. But again, in that case I'd prefer to figure out a way with multipath devs to get what we really want (wait until ownership is taken and symlinks updated).
Until then, I think it's OK if we have rd.multipath
implying that the boot partition (and the root partition at least on the first boot prior to any possible reconfiguration) is on multipath. And while conceptually I agree that it shouldn't, there isn't really any other reason to have multipath enabled in the initrd since the primary purpose of the initrd is to set up the rootfs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, good argument! Indeed we can't make boot
too special because GRUB has to read it. (But...on the flip side GRUB has unfortunately grown more stuff over time, though it seems unlikely to do multipath).
If root is on multipath (which is today for CoreOS always `rd.multipath=default`) then we *know* we must use it for `/boot`. We're not going to support "tearing" where `/boot` is on a non-mpath device but `/` is on mpath. The current code is I believe racy because at the time the generator runs (and systemd generators run *early*), we're querying the "current" properties of the device at `/dev/disk/by-label/boot`. But multipathd could still be in the process of setting up and replacing the target of that symlink. This can cause systemd to tear down and reinitialize the mount, causing races. https://bugzilla.redhat.com/show_bug.cgi?id=1944660
560b996
to
55723e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note, but LGTM as is too.
# See discussion in https://github.com/coreos/fedora-coreos-config/pull/1022 | ||
bootdev=/dev/disk/by-label/boot | ||
# Yes this isn't a real karg parser but we're trapped in this shell script | ||
if grep -q rd.multipath /proc/cmdline; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: this will incorrectly trigger on rd.multipath=0
(see https://github.com/dracutdevs/dracut/blob/9355cb8ea5024533210067373657dc337d63ecb9/modules.d/90multipath/multipathd.service#L12). I mean, a user really shouldn't have to do that, though still would be good to be resilient here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the real root; I'm uncertain about pulling in dracut-lib.sh
into the real root. We could try to ship rdcore
in the real root too and move this there?
Or I could special case rd.multipath=default
for now...
Or it looks like coreos-liveiso-autologin-generator
has have_karg
that we could try to factor out into a little shell library shared by our shell generators...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this as is so I can cherry pick it, and figure out a cleaner karg handling later?
Hm we don't have an |
Hmm yeah I'd say it's about time now. Maybe let's ask on Slack though I'd lean towards cutting from where openshift/os is currently pinning. |
OK, will give this a bit of time for anyone else to weigh in otherwise I'll just create |
(Thinking about this a bit more, perhaps what would be most ergonomic actually would be keeping openshift/os |
|
udev/90-coreos-device-mapper: Create label links in real root too
We're currently gating on
ENV{DM_SUSPENDED}=="Active"
but10-dm.rules
does:So what I think is happening here is that our rule happens to
run before that kicks in, so we make the links once, but not
thereafter.
Change the condition to match what
13-dm-disk.rules
fromLVM is doing.
Also slightly reorder the code and add some comments for extra clarity.
coreos-boot-mount-generator: Always use mpath for /boot if for root
If root is on multipath, then we know we must use it for
/boot
.The current code is I believe racy because at the time the generator
runs, we're querying the current properties of the device at
/dev/disk/by-label/boot
. But multipathd could still be inthe process of setting up and replacing what that symlink
points to.
https://bugzilla.redhat.com/show_bug.cgi?id=1944660